EUROTRA - A European System for Machine Translation
نویسنده
چکیده
1. Lessons from the past Previous articles in this journal will have given the reader an idea of the state of the art in currently operational machine translation systems This article describes a system which is planned, and which it is hoped will be developed by all the Member States of the European Community acting together, within the framework of a single collaborative project. The motivation for such a project is manifold. First, we have learnt a great deal from the systems which already exist, both in terms of what to do and in terms of what not to do. To take the positive lessons first: the most important, of course, is that machine aided translation is feasible. This lesson is extremely important. After the disappointments of the 60's, it took a great deal of courage to persist in the belief that it was worthwhile working on machine translation. A great debt is owed to those who did persist, whether they continued to develop commercial systems with the tools then available or whether they carried on with the research needed to provide a sound basis for more advanced systems. Mad it not been for their stubbornness, machine translation would now be one of those good ideas which somebody once had, but which proved in the end impractical like a perpetual motion machine, for example -instead of being a discipline undergoing a period of renaissance and new growth. Secondly, we have learnt that problems which once seemed intractable are not really so. Looking at a book on machine translation written in the early 60's the other day, I was surprised to find the treatment of idioms and of semi-fixed phrases being discussed as a difficult theoretical problem. Of course, idioms still must be treated, and must be treated with care, but operational systems have shown us that they can be successfully translated. This does not mean that no system will ever again translate "out of sight, out of mind" as "invisible idiot", but if it does so, it will be for lack of relevant data, not because mechanisms to deal with such phrases are not adequate. It would be possible to make a fairly extensive list of similar problems, which once gave machine translators nightmares but now only cause mild insomnia. Suffice it to say that experience with existing systems has given us the knowledge that such problems can be solved, and the courage to find ever better ways of solving them. At a technical level, too, we have learnt a lot from existing systems. Early, not very successful, machine translation systems were dictionary based, essentially taking one word at a time and trying to find its equivalent in the target language. As a fairly natural reaction to the disappointing results obtained by such a method, there was something of a swing later to concentrating on the linguistic analysis parts of the system, those parts which tried to determine the underlying structure of the input text in order to translate at a "deeper" level. Practical experience has taught us that even though analysis is crucial, dictionaries retain a great importance, in that any working system will rely heavily on large dictionaries, sometimes containing whole expressions as single entries, rich in static linguistic information on each entry and serving as essential data for the translation process. So we have learnt to pay attention both to the initial design and coding of dictionaries, and to their manipulation in terms of large data bases which must be constantly updated and maintained. Based on rather more negative experience, we have learnt that system design is all important in a machine translation system. This can be said rather differently, by saying that we have discovered that a translation system is necessarily going to be big and that big systems need special treatment. No one person, or even group of persons, can hope to keep a large computer program under control if it is written as an amorphous mass. It will be impossible, when things go wrong, as they inevitably do, to find out where in the program they went wrong, or why It will be impossible for an outsider who has inherited the program from its original author(s), to understand what they did or why they did it. So a large program must be made as modular as possible: that means that it must be broken up into well-defined sections, each one with its task clearly known, together with the starting information it will work on and the results it can be expected to give. In addition, it must be well documented. It should be written in a computer language as easily readable and comprehensible as possible, and should be provided with an abundance of commentary explaining its function. None of the above paragraph is specific to machine translation systems: indeed, its content is by now the received wisdom passed on even in elementary courses in computer programming. But one aspect of systems design is particular to machine translation, and that is the absolute necessity of a rigid distinction between algorithms and data. This distinction, although it sounds esoteric, is in fact familiar to anyone who has ever followed a recipe. In their standard form, recipes give first a list of materials required and then a set of instructions saying what to do with these materials. The list of materials corresponds more or less to the data, the list of instructions to an algorithm. In the case of analysing language, the data will consist of, for example, dictionary information and a description of syntactic or semantic grammar rules, whilst the algorithmic part of the system consists of instructions about how to apply the rules and the dictionary information to a text in order to determine its structure. There is a constant temptation to mix up the two: to include inside a dictionary entry, for example, a little instruction to go and look for a particular dictionary entry following this one, or to put into the algorithm trying to find noun groups the information that adjectives come (sometimes) between articles and nouns. The consequences of falling into this temptation can be very nasty indeed. The most common consequence is that it becomes impossible, eventually, to change the system in order to correct mistakes or to enlarge the range of texts it can deal with. A relatively minor change, meant to deal with one specific linguistic feature, may affect the treatment of other features in quite unforeseeable ways. So information and what to do with it should be kept apart.
منابع مشابه
Design Characteristics of a Machine Translation System
This paper distinguishes a set of criteria to be met by a machine translation system (EUROTRA) currently being planned under the sponsorship of the Commission of the European Communities and attempts to show the effect of meeting those criteria on the overall system design.
متن کاملEurotra: past, present and future
Ten years ago Eurotra was just an idea discussed in an expert committee created by the Commission. After scanning the market in order to find a suitable MT system the members had realised that all available operational systems were American and based on a ten-year-old state-of-the-art in linguistics and computer science. In consequence, it seemed like a good idea to try to produce a European MT...
متن کاملA strategy for solving translation relevant ambiguities in a multilingual machine translation system
Eurotra is a research and development project in machine translation sponsored by the European Commission and the EEC member states. The project was launched in 1984, and its aim is to stimulate research in computational linguistics in Europe, and to produce a running prototype for a multi lingual machine translation system towards 1990. This prototype will translate between any two of the nin...
متن کاملEUROTRA: Practical Experience With A Multilingual Machine Translation System Under Development
£BSTUACT In this paper we will motivate the design decisions for the architecture of the Eurotra system and Its grammar formalism In terms of the Intrinsic constraints of multlllngual machine translation and the extrinsic requirements Imposed by the specific organizational structure of the project. We will give an account of the state of Implementation of the system to date and assess advantage...
متن کاملLarge Lexical European Projects and the MultilinguaI Aspect
We aim at providing an overview and a comparison of multilingual and Machine Translation (MT) related issues that emerge and that are handled within the major European projects in the lexical area (ET-7, Acquilex I and If, Multilex, Genelex, ET-10 Semantic Analysis of Cobuild, ET-10 Collocations, ET-10 Statistical Text-corpora based complements for Eurotra, Delis, Eurolang and Eagles), most of ...
متن کاملMT in the 1970s and 1980s
(SK) Well I guess it is a slightly different answer for each of us, but essentially for all of us, it was through the Eurotra project. This was the MT project that was created by the European Commission in the late 1970s. The aim was to produce a high quality, fully automatic machine translation system for use by the EC. The EC had a serious translation problem because of the number of language...
متن کامل